智能论文笔记

2D-Motion Detection using SNNs with Graphene-Insulator-Graphene Memristive Synapses

Shubham Pande , Karthi Srinivasan , Suresh Balanethiram , Bhaswar Chakrabarti , Anjan Chakravorty

分类：神经与进化计算

2021-11-30

尖峰神经网络的事件驱动性质使它们具有生物学上可符合的和比人工神经网络更节能。在这项工作中，我们展示了二维视野中对象的运动检测。这里呈现的网络架构是生物学卓越的，并使用CMOS模拟泄漏整合和灭火神经元和超低功耗多层RRAM突触。具体的跨晶体管纤维Spice模拟表明，所提出的结构可以在二维视野中准确可靠地检测物体的复杂运动。

translated by 谷歌翻译

Analysis and application of multispectral data for water segmentation using machine learning

Shubham Gupta , Uma D. , Ramachandra Hebbar

分类：计算机视觉

2022-12-16

Monitoring water is a complex task due to its dynamic nature, added pollutants, and land build-up. The availability of high-resolu-tion data by Sentinel-2 multispectral products makes implementing remote sensing applications feasible. However, overutilizing or underutilizing multispectral bands of the product can lead to inferior performance. In this work, we compare the performances of ten out of the thirteen bands available in a Sentinel-2 product for water segmentation using eight machine learning algorithms. We find that the shortwave infrared bands (B11 and B12) are the most superior for segmenting water bodies. B11 achieves an overall accuracy of $71\%$ while B12 achieves $69\%$ across all algorithms on the test site. We also find that the Support Vector Machine (SVM) algorithm is the most favourable for single-band water segmentation. The SVM achieves an overall accuracy of $69\%$ across the tested bands over the given test site. Finally, to demonstrate the effectiveness of choosing the right amount of data, we use only B11 reflectance data to train an artificial neural network, BandNet. Even with a basic architecture, BandNet is proportionate to known architectures for semantic and water segmentation, achieving a $92.47$ mIOU on the test site. BandNet requires only a fraction of the time and resources to train and run inference, making it suitable to be deployed on web applications to run and monitor water bodies in localized regions. Our codebase is available at https://github.com/IamShubhamGupto/BandNet.

translated by 谷歌翻译

Imitation Learning based Auto-Correction of Extrinsic Parameters for A Mixed-Reality Setup

Shubham Sonawani , Yifan Zhou , Heni Ben Amor

分类：机器人

2022-12-16

In this paper, we discuss an imitation learning based method for reducing the calibration error for a mixed reality system consisting of a vision sensor and a projector. Unlike a head mounted display, in this setup, augmented information is available to a human subject via the projection of a scene into the real world. Inherently, the camera and projector need to be calibrated as a stereo setup to project accurate information in 3D space. Previous calibration processes require multiple recording and parameter tuning steps to achieve the desired calibration, which is usually time consuming process. In order to avoid such tedious calibration, we train a CNN model to iteratively correct the extrinsic offset given a QR code and a projected pattern. We discuss the overall system setup, data collection for training, and results of the auto-correction model.

translated by 谷歌翻译

Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot Manipulation

Yifan Zhou , Shubham Sonawani , Mariano Phielipp , Simon Stepputtis , Heni Ben Amor

分类：机器人

2022-12-08

Language-conditioned policies allow robots to interpret and execute human instructions. Learning such policies requires a substantial investment with regards to time and compute resources. Still, the resulting controllers are highly device-specific and cannot easily be transferred to a robot with different morphology, capability, appearance or dynamics. In this paper, we propose a sample-efficient approach for training language-conditioned manipulation policies that allows for rapid transfer across different types of robots. By introducing a novel method, namely Hierarchical Modularity, and adopting supervised attention across multiple sub-modules, we bridge the divide between modular and end-to-end learning and enable the reuse of functional building blocks. In both simulated and real world robot manipulation experiments, we demonstrate that our method outperforms the current state-of-the-art methods and can transfer policies across 4 different robots in a sample-efficient manner. Finally, we show that the functionality of learned sub-modules is maintained beyond the training process and can be used to introspect the robot decision-making process. Code is available at https://github.com/ir-lab/ModAttn.

translated by 谷歌翻译

SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction

Zhizhuo Zhou , Shubham Tulsiani

分类：计算机视觉

2022-12-01

We propose SparseFusion, a sparse view 3D reconstruction approach that unifies recent advances in neural rendering and probabilistic image generation. Existing approaches typically build on neural rendering with re-projected features but fail to generate unseen regions or handle uncertainty under large viewpoint changes. Alternate methods treat this as a (probabilistic) 2D synthesis task, and while they can generate plausible 2D images, they do not infer a consistent underlying 3D. However, we find that this trade-off between 3D consistency and probabilistic image generation does not need to exist. In fact, we show that geometric consistency and generative inference can be complementary in a mode-seeking behavior. By distilling a 3D consistent scene representation from a view-conditioned latent diffusion model, we are able to recover a plausible 3D representation whose renderings are both accurate and realistic. We evaluate our approach across 51 categories in the CO3D dataset and show that it outperforms existing methods, in both distortion and perception metrics, for sparse-view novel view synthesis.

translated by 谷歌翻译

Sign Language to Text Conversion in Real Time using Transfer Learning

Shubham Thakar , Samveg Shah , Bhavya Shah , Anant V. Nimkar

分类：计算机视觉 | 机器学习

2022-11-13

The people in the world who are hearing impaired face many obstacles in communication and require an interpreter to comprehend what a person is saying. There has been constant scientific research and the existing models lack the ability to make accurate predictions. So we propose a deep learning model trained on ASL i.e. American Sign Language which will take actions in the form of ASL as input and translate it into text. To achieve the translation a Convolution Neural Network model and a transfer learning model based on the VGG16 architecture are used. There has been an improvement in accuracy from 94% of CNN to 98.7% of Transfer Learning, an improvement of 5%. An application with the deep learning model integrated has also been built.

translated by 谷歌翻译

Going for GOAL: A Resource for Grounded Football Commentaries

Alessandro Suglia , José Lopes , Emanuele Bastianelli , Andrea Vanzo , Shubham Agarwal , Malvina Nikandrou , Lu Yu , Ioannis Konstas , Verena Rieser

分类：计算机视觉 | 自然语言处理

2022-11-08

Recent video+language datasets cover domains where the interaction is highly structured, such as instructional videos, or where the interaction is scripted, such as TV shows. Both of these properties can lead to spurious cues to be exploited by models rather than learning to ground language. In this paper, we present GrOunded footbAlL commentaries (GOAL), a novel dataset of football (or `soccer') highlights videos with transcribed live commentaries in English. As the course of a game is unpredictable, so are commentaries, which makes them a unique resource to investigate dynamic language grounding. We also provide state-of-the-art baselines for the following tasks: frame reordering, moment retrieval, live commentary retrieval and play-by-play live commentary generation. Results show that SOTA models perform reasonably well in most tasks. We discuss the implications of these results and suggest new tasks for which GOAL can be used. Our codebase is available at: https://gitlab.com/grounded-sport-convai/goal-baselines.

translated by 谷歌翻译

An Ensemble-based approach for assigning text to correct Harmonized system code

Shubham , Avinash Arya , Subarna Roy , Sridhar Jonnala

分类：人工智能

2022-11-08

Industries must follow government rules and regulations around the world to classify products when assessing duties and taxes for international shipment. Harmonized System (HS) is the most standardized numerical method of classifying traded products among industry classification systems. A hierarchical ensemble model comprising of Bert- transformer, NER, distance-based approaches, and knowledge-graphs have been developed to address scalability, coverage, ability to capture nuances, automation and auditing requirements when classifying unknown text-descriptions as per HS method.

translated by 谷歌翻译

B2B Advertising: Joint Dynamic Scoring of Account and Users

Atanu R. Sinha , Gautam Choudhary , Mansi Agarwal , Shivansh Bindal , Abhishek Pande , Camille Girabawe

分类：机器学习

2022-09-28

当一家企业向另一家企业（B2B）出售时，购买业务由一组称为帐户的个人代表，他们共同决定是否购买。卖方向每个人做广告，并与他们互动，主要是通过数字方式进行的。销售周期很长，通常在几个月内。在寻求信息时，属于帐户的个人之间存在异质性，因此卖方需要在漫长的视野中对每个人的利益进行评分，以决定必须达到哪些人以及何时达到。此外，购买决定与帐户有关，必须进行评分才能投射购买的可能性，这一决定可能会一直变化，直到实际的决定，象征组决策。我们以动态的方式为帐户及其个人的决定分数。动态评分允许机会在长时间的不同时间点影响不同的单个成员。数据集包含与卖方的每个人通信活动的行为日志；但是，没有关于个人之间咨询的数据，这导致了决定。使用神经网络体系结构，我们提出了几种方法来汇总各个成员活动的信息，以预测该小组的集体决策。多次评估发现了强大的模型性能。

translated by 谷歌翻译

Detecting Crop Burning in India using Satellite Data

Kendra Walker , Ben Moscona , Kelsey Jack , Seema Jayachandran , Namrata Kala , Rohini Pande , Jiani Xue , Marshall Burke

分类：计算机视觉 | 机器学习

2022-09-21

农作物残留物燃烧是世界许多地方的空气污染的主要来源，尤其是南亚。政策制定者，从业人员和研究人员都投资了衡量影响和制定干预措施以减少燃烧。但是，测量燃烧的影响或干预措施的有效性减少燃烧需要数据燃烧的位置。这些数据在成本和可行性方面都在现场收集具有挑战性。我们利用印度旁遮普邦旁遮普邦农作物残留物燃烧的地面监测的数据，以探索使用可访问的卫星图像是否可以更有效地检测到燃烧。具体而言，我们使用了具有高时间分辨率（最多每天）的3M Planetscope数据以及具有每周时间分辨率但光谱信息深度的公共可用Sentinel-2数据。在分析了不同光谱带和燃烧指数单独分离燃烧和未燃烧图的能力之后，我们构建了一个随机森林模型，这些模型确定提供了最大的分离性，并用地面验证的数据评估了模型性能。鉴于测量所带来的挑战，我们的总体模型精度为82％是有利的。基于此过程的见解，我们讨论了检测卫星图像中农作物残留物燃烧的技术挑战，以及衡量燃烧和政策干预措施的影响的挑战。

translated by 谷歌翻译